EvoMiner: Frequent Subtree Mining in Phylogenetic Databases Technical Report #11-08, Dept. of Computer Science, Iowa State University
نویسندگان
چکیده
The problem of mining collections of trees to identify common patterns, called frequent subtrees (FSTs), arises often when trying to make sense of the results of phylogenetic analysis. FST mining generalizes the well-known maximum agreement subtree problem. Here we present EvoMiner, a new algorithm for mining frequent subtrees in collections of phylogenetic trees. EvoMiner is an Apriori-like level-wise method, which uses a novel phylogeny-speci c constant-time candidate generation scheme, an e cient ngerprinting-based technique for downward closure operation, and a lowest common ancestor based support counting step that requires neither costly subtree operations nor database traversal. As a result of these techniques, our algorithm achieves speed-ups of up to 100 times or more over Phylominer, another algorithm for mining phylogenetic trees. EvoMiner can also work in vertical mining mode, to use less memory at the expense of speed.
منابع مشابه
Enumerating All Maximal Frequent Subtrees
Given a collection of leaf-labeled trees on a common leafset and a fraction f in (1/2,1], a frequent subtree (FST) is a subtree isomorphically included in at least fraction f of the input trees. The well-known maximum agreement subtree (MAST) problem identifies FST with f = 1 and having the largest number of leaves. Apart from its intrinsic interest from the algorithmic perspective, MAST has pr...
متن کاملPreserving Separation of Concerns Through Compilation
Current aspect-oriented (AO) compilation techniques fail to preserve the separation of concerns for postcompilation phases. At the minimum, it makes efficient incremental compilation and unit testing of AO programs challenging. The contribution of this work is an improved approach for aspect-oriented compilation. Our approach rests on a new interface between the AO high-level language (HLL) com...
متن کاملFrequent Subtree Mining - An Overview
Mining frequent subtrees from databases of labeled trees is a new research field that has many practical applications in areas such as computer networks, Web mining, bioinformatics, XML document mining, etc. These applications share a requirement for the more expressive power of labeled trees to capture the complex relations among data entities. Although frequent subtree mining is a more diffic...
متن کاملA Bibliography and Index of Our Works on Belief Data: Concept of Error and Multilevel Security
In 1988 we initiated our work on belief data. The work proceeded in two phases: in the first phase we formalized the concept of error in everyday record keeping, and in the second phase we considered multilevel security. The purpose of this report is to create an awareness about our works on belief data and to serve as a guide for the following manuscripts. The first two manuscripts are on the ...
متن کاملOptimal and Approximate Approaches for Selecting Proxy Agents in Mobile Network Backbones
Selecting Proxy Agents in Mobile Network Backbones Ahmed Kamal Dept. of Electrical & Comp. Eng Iowa State University Ames, Iowa 50011-3060 [email protected] Hesham El-Rewini Department of Computer Science & Engineering Southern Methodist University Dallas, Texas 75275-0122 [email protected] Raza Ul-Mustafa Dept. of Electrical & Comp. Eng Iowa State University Ames, Iowa 50011-3060 raza@iastat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011